Search CORE

37 research outputs found

Automatic Classification of Spoken Languages using Diverse Acoustic Features

Author: HaCohen-Kerner Yaakov
Hagege Ruben
Publication venue
Publication date: 01/01/2015
Field of study

Positive and Negative Sentiment Words in a Blog Corpus Written in Hebrew

Author: Badash Haim
HaCohen-Kerner Yaakov
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractIn this research, given a corpus containing blog posts written in Hebrew and two seed sentiment lists, we analyze the positive and negative sentences included in the corpus, and special groups of words that are associated with the positive and negative seed words. We discovered many new negative words (around half of the top 50 words) but only one positive word. Among the top words that are associated with the positive seed words, we discovered various first-person and third-person pronouns. Intensifiers were found for both the positive and negative seed words. Most of the corpus’ sentences are neutral. For the rest, the rate of positive sentences is above 80%. The sentiment scores of the top words that are associated with the positive words are significantly higher than those of the top words that are associated with the negative words.Our conclusions are as follows. Positive sentences more “refer to” the authors themselves (first-person pronouns and related words) and are also more general, e.g., more related to other people (third-person pronouns), while negative sentences are much more concentrated on negative things and therefore contain many new negative words. Israeli bloggers tend to use intensifiers in order to emphasize or even exaggerate their sentiment opinions (both positive and negative). These bloggers not only write much more positive sentences than negative sentences, but also write much longer positive sentences than negative sentences

Elsevier - Publisher Connector

Distinguishing between True and False Stories using various Linguistic Features

Author: Cohen Daniel Nissim
Dilmon Rakefet
Friedlich Shimon
HaCohen-Kerner Yaakov
Publication venue
Publication date: 01/01/2015
Field of study

This paper analyzes what linguistic features differentiate true and false stories written in Hebrew. To do so, we have defined four feature sets containing 145 features: POS-tags, quantitative, repetition, and special expressions. The examined corpus contains stories that were composed by 48 native Hebrew speakers who were asked to tell both false and true stories. Classification experiments on all possible combinations of these four feature sets using five supervised machine learning methods have been applied. The Part of Speech (POS) set was superior to all others and has been found as a key component. The best accuracy result (89.6%) has been achieved by a combination of sixteen POS-tags and one quantitative feature.

CiteSeerX

Waseda University Repository

Creating expert knowledge by relying on language learners : a generic approach for mass-producing language resources by combining implicit crowdsourcing and language learning

Author: 12th edition of the Language Resources and Evaluation Conference (LREC'20)
Aparaschivei Lavina
Barreiro Anabela
Borg Claudia
Cibej Jaka
Forascu Corina
Fort Karen
HaCohen-Kerner Yaakov
Hassan Umair ul
Holdt Spela Arhar
Katinskaia Anisia
Konig Alexander
Kosem Iztok
Lyding Verena
Millour Alice
Nicholas Lionel
Rodosthenous Christos
Sangati Federico
Zdravkova Katerina
Publication venue
Publication date: 01/05/2020
Field of study

We introduce in this paper a generic approach to combine implicit crowdsourcing and language learning in order to mass-produce language resources (LRs) for any language for which a crowd of language learners can be involved. We present the approach by explaining its core paradigm that consists in pairing specific types of LRs with specific exercises, by detailing both its strengths and challenges, and by discussing how much these challenges have been addressed at present. Accordingly, we also report on on-going proof-of-concept efforts aiming at developing the first prototypical implementation of the approach in order to correct and extend an LR called ConceptNet based on the input crowdsourced from language learners. We then present an international network called the European Network for Combining Language Learning with Crowdsourcing Techniques (enetCollect) that provides the context to accelerate the implementation of the generic approach. Finally, we exemplify how it can be used in several language learning scenarios to produce a multitude of NLP resources and how it can therefore alleviate the long-standing NLP issue of the lack of LRs.peer-reviewe

OAR@UM

Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop

Author: Al Saied Hazem
Allen James F.
Alsulaimani Ashjan
Baayen R. Harald
Baldwin Timothy
Barančíková Petra
Bejček Eduard
Bhatia Archna
Brooke Julian
Candito Marie
Cap Fabienne
Chan King
Constant Matthieu
Cook Paul
Dutta Chowdhury Koel
Eryiğit Gülşen
Fazly Afsaneh
Garcia Marcos
Geeraert Kristina
Giouli Voula
HaCohen-Kerner Yaakov
Han Lifeng
Kettnerová Václava
Kovalevskaitė Jolanta
Kovács Viktória
Krek Simon
Liebeskind Chaya
Maldonado Alfredo
Man Teng Choh
Markantonatou Stella
Mititelu Verginica Barbu
Mitkov Ruslan
Monti Johanna
Moreau Erwan
Newman John
Parra Escartín Carla
QasemiZadeh Behrang
Ramisch Carlos
Ricardo Cordeiro Silvio
Rohanian Omid
Salehi Bahar
Sangati Federico
Savary Agata
Scholivet Manon
Simkó Katalina Ilona
Stoyanova Ivelina
Taslimipoor Shiva
van der Plas Lonneke
van Gompel Maarten
Vincze Veronika
Vogel Carl
Čéplö Slavomir
Publication venue: Language Science Press
Publication date: 18/06/2018
Field of study

The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

Language Science Press